最近的深度学习(DL)应用主要建立在DL库的顶部。这些库的质量保证对于可靠的DL应用程序的可靠部署至关重要。因此,提出了一些技术来通过生成DL模型作为测试输入来测试DL库。然后,这些技术将这些DL模型馈送到DL库进行推断,以便行使与DL模型执行相关的DL库模块。但是,这些技术的测试有效性受生成的DL模型的多样性的限制。我们的研究发现,这些技术最多可以覆盖层对的11.7%(即,在两个层API之间调用序列)和层参数的55.8%(例如,在Conv2d中的“ Padding”)。结果,我们发现现有技术可能会错过特定层对和参数引起的许多错误。鉴于现有DL库测试技术的局限性,我们建议备忘录通过探索层类型,层对和层参数来有效地生成不同的DL模型。备忘录:(1)设计一种初始模型还原技术,以提高测试效率而不损害模型多样性; (2)为定制的Markov链蒙特卡洛(MCMC)算法设计一组突变操作员,以探索新的层类型,层对和层参数。我们在七个流行的DL库上评估了备忘录,其中包括四个用于模型执行(Tensorflow,Pytorch和MXNET和ONNX)和三个用于模型转换的备忘录(KERAS-MXNET,TF2ONNX,ONNX2PYTORCH)。评估结果表明,备忘录的表现优于最近的作品,覆盖了10.3%的层对,多15.3%的层参数和2.3%的库分支。此外,备忘录在最新版本的DL库中检测到29个新错误,其中17个由DL库开发人员确认,其中5个已确认的错误已修复。
translated by 谷歌翻译
模型压缩可以显着降低深度神经网络(DNN)模型的大小,以便在资源限制的移动和物联网设备上部署压缩后大型复杂的模型。但是,模型压缩通常将偏离的行为引入压缩模型:原始和压缩模型输出相同输入的不同预测结果。因此,警告开发人员至关重要,并帮助他们全面评估部署前的这种行为的可能后果。为此,我们提出了TriggerFinder,一种新颖,有效和有效的测试方法来自动识别输入以触发压缩模型中的偏离行为。给出一个作为种子的输入I,触发器采用迭代地应用一系列突变操作以改变I,直到得到的输入触发偏离的行为。但是,压缩模型通常隐藏其架构和梯度信息;没有这样的内部信息作为指导,它变得难以有效且有效地触发偏离的行为。为了解决这一挑战,我们提出了一种新颖的健身功能来确定更接近能够触发偏移预测的输入的突变输入。此外,TriggerFinder将该搜索问题模拟作为Markov链过程,并利用了Metropolis-Hasting算法来指导突变运算符的选择。我们在具有两个数据集的18个压缩模型上评估了TriggerFinder。实验结果表明,在某些情况下,基线在基线发生故障时,触发器可以成功找到所有种子输入的触发输入。至于效率,TriggerFinder为5.2x-115.8x,与基线一样快。此外,TriggerFinder要求找到一个触发输入的查询仅为51.8x-535.6x,作为基线。
translated by 谷歌翻译
Aligning users across networks using graph representation learning has been found effective where the alignment is accomplished in a low-dimensional embedding space. Yet, achieving highly precise alignment is still challenging, especially when nodes with long-range connectivity to the labeled anchors are encountered. To alleviate this limitation, we purposefully designed WL-Align which adopts a regularized representation learning framework to learn distinctive node representations. It extends the Weisfeiler-Lehman Isormorphism Test and learns the alignment in alternating phases of "across-network Weisfeiler-Lehman relabeling" and "proximity-preserving representation learning". The across-network Weisfeiler-Lehman relabeling is achieved through iterating the anchor-based label propagation and a similarity-based hashing to exploit the known anchors' connectivity to different nodes in an efficient and robust manner. The representation learning module preserves the second-order proximity within individual networks and is regularized by the across-network Weisfeiler-Lehman hash labels. Extensive experiments on real-world and synthetic datasets have demonstrated that our proposed WL-Align outperforms the state-of-the-art methods, achieving significant performance improvements in the "exact matching" scenario. Data and code of WL-Align are available at https://github.com/ChenPengGang/WLAlignCode.
translated by 谷歌翻译
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
translated by 谷歌翻译
We propose an extrinsic Bayesian optimization (eBO) framework for general optimization problems on manifolds. Bayesian optimization algorithms build a surrogate of the objective function by employing Gaussian processes and quantify the uncertainty in that surrogate by deriving an acquisition function. This acquisition function represents the probability of improvement based on the kernel of the Gaussian process, which guides the search in the optimization process. The critical challenge for designing Bayesian optimization algorithms on manifolds lies in the difficulty of constructing valid covariance kernels for Gaussian processes on general manifolds. Our approach is to employ extrinsic Gaussian processes by first embedding the manifold onto some higher dimensional Euclidean space via equivariant embeddings and then constructing a valid covariance kernel on the image manifold after the embedding. This leads to efficient and scalable algorithms for optimization over complex manifolds. Simulation study and real data analysis are carried out to demonstrate the utilities of our eBO framework by applying the eBO to various optimization problems over manifolds such as the sphere, the Grassmannian, and the manifold of positive definite matrices.
translated by 谷歌翻译
Many state-of-the-art natural language understanding (NLU) models are based on pretrained neural language models. These models often make inferences using information from multiple sources. An important class of such inferences are those that require both background knowledge, presumably contained in a model's pretrained parameters, and instance-specific information that is supplied at inference time. However, the integration and reasoning abilities of NLU models in the presence of multiple knowledge sources have been largely understudied. In this work, we propose a test suite of coreference resolution tasks that require reasoning over multiple facts. Our dataset is organized into subtasks that differ in terms of which knowledge sources contain relevant facts. We evaluate state-of-the-art coreference resolution models on our dataset. Our results indicate that several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time. However, with task-specific training, a subset of models demonstrates the ability to integrate certain knowledge types from multiple sources.
translated by 谷歌翻译
This work addresses fair generative models. Dataset biases have been a major cause of unfairness in deep generative models. Previous work had proposed to augment large, biased datasets with small, unbiased reference datasets. Under this setup, a weakly-supervised approach has been proposed, which achieves state-of-the-art quality and fairness in generated samples. In our work, based on this setup, we propose a simple yet effective approach. Specifically, first, we propose fairTL, a transfer learning approach to learn fair generative models. Under fairTL, we pre-train the generative model with the available large, biased datasets and subsequently adapt the model using the small, unbiased reference dataset. We find that our fairTL can learn expressive sample generation during pre-training, thanks to the large (biased) dataset. This knowledge is then transferred to the target model during adaptation, which also learns to capture the underlying fair distribution of the small reference dataset. Second, we propose fairTL++, where we introduce two additional innovations to improve upon fairTL: (i) multiple feedback and (ii) Linear-Probing followed by Fine-Tuning (LP-FT). Taking one step further, we consider an alternative, challenging setup when only a pre-trained (potentially biased) model is available but the dataset that was used to pre-train the model is inaccessible. We demonstrate that our proposed fairTL and fairTL++ remain very effective under this setup. We note that previous work requires access to the large, biased datasets and is incapable of handling this more challenging setup. Extensive experiments show that fairTL and fairTL++ achieve state-of-the-art in both quality and fairness of generated samples. The code and additional resources can be found at bearwithchris.github.io/fairTL/.
translated by 谷歌翻译
Unsupervised foreground-background segmentation aims at extracting salient objects from cluttered backgrounds, where Generative Adversarial Network (GAN) approaches, especially layered GANs, show great promise. However, without human annotations, they are typically prone to produce foreground and background layers with non-negligible semantic and visual confusion, dubbed "information leakage", resulting in notable degeneration of the generated segmentation mask. To alleviate this issue, we propose a simple-yet-effective explicit layer independence modeling approach, termed Independent Layer Synthesis GAN (ILSGAN), pursuing independent foreground-background layer generation by encouraging their discrepancy. Specifically, it targets minimizing the mutual information between visible and invisible regions of the foreground and background to spur interlayer independence. Through in-depth theoretical and experimental analyses, we justify that explicit layer independence modeling is critical to suppressing information leakage and contributes to impressive segmentation performance gains. Also, our ILSGAN achieves strong state-of-the-art generation quality and segmentation performance on complex real-world data.
translated by 谷歌翻译
Multiview self-supervised representation learning roots in exploring semantic consistency across data of complex intra-class variation. Such variation is not directly accessible and therefore simulated by data augmentations. However, commonly adopted augmentations are handcrafted and limited to simple geometrical and color changes, which are unable to cover the abundant intra-class variation. In this paper, we propose to extract the underlying data variation from datasets and construct a novel augmentation operator, named local manifold augmentation (LMA). LMA is achieved by training an instance-conditioned generator to fit the distribution on the local manifold of data and sampling multiview data using it. LMA shows the ability to create an infinite number of data views, preserve semantics, and simulate complicated variations in object pose, viewpoint, lighting condition, background etc. Experiments show that with LMA integrated, self-supervised learning methods such as MoCov2 and SimSiam gain consistent improvement on prevalent benchmarks including CIFAR10, CIFAR100, STL10, ImageNet100, and ImageNet. Furthermore, LMA leads to representations that obtain more significant invariance to the viewpoint, object pose, and illumination changes and stronger robustness to various real distribution shifts reflected by ImageNet-V2, ImageNet-R, ImageNet Sketch etc.
translated by 谷歌翻译
Transformer models have achieved great success across many NLP problems. However, previous studies in automated ICD coding concluded that these models fail to outperform some of the earlier solutions such as CNN-based models. In this paper we challenge this conclusion. We present a simple and scalable method to process long text with the existing transformer models such as BERT. We show that this method significantly improves the previous results reported for transformer models in ICD coding, and is able to outperform one of the prominent CNN-based methods.
translated by 谷歌翻译